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Abstract — Many signal processing problems can be solved by 
maximizing the fitness of a segmented model over all possible 
partitions of the data interval. This letter describes a simple but 
powerful algorithm that searches the exponentially large space 
of partitions of N data points in time 0(N 2 ). The algorithm 
is guaranteed to find the exact global optimum, automatically 
determines the model order (the number of segments), has a 
convenient real-time mode, can be extended to higher dimensional 
data spaces, and solves a surprising variety of problems in 
signal detection and characterization, density estimation, cluster 
analysis and classification. 

Index Terms — signal detection, density estimation, optimiza- 
tion, Bayesian modeling, histograms, cluster analysis 
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I. Introduction: The Problem 

A variety of signal processing and related problems can 
be viewed as the search for an optimal partition of data 
given on a time interval /. For example, one may estimate 
a segmented model by maximizing some measure of model 
fitness 1 defined on partitions of /. Since the space of all 
partitions of a continuum is infinite, it is advantageous to 
discretize the interval. Often the data points themselves, say 



X n ,n = l,2, 



(1) 



naturally subdivide I into subintervals - which we call data 
cells. We avoid a precise definition because many types of 
data cells are possible. Common examples are counts in bins, 
measurements at a set of sample times (evenly spaced or not), 
and event or point data. The underlying idea is that restricting 
consideration to the finite space of partitions whose elements 
are sets of data cells will result in no significant loss of 
information or of resolution in the independent variable. 
A partition P of an interval / is a set of M blocks, 



P(J) = {B m ,meM} ,M= {1,2,. ..M} , 



(2) 



1 This concept goes by many names, including fitness, cost, goodness of fit, 
loss, penalty, objective function, risk etc. Here we assume that the data analysis 
problem can be phrased in terms of some fitness that is to be maximized. 



where the blocks are sets of data cells defined by index sets 



(3) 



satisfying the usual conditions, 1J B m = I and B m f] B m i = 

m 

if to ^ to'. Here means a set of zero area, since the ID 
boundaries are not relevant here. We impose connectedness 
too — i.e. that there be no gaps between the cells comprising 
a given block. M, the number of blocks, must satisfy < 
M < N. Partitions will be denoted in boldface, and refer to 
the interval / unless otherwise stated. Define P* to be the 
(finite) set of all partitions of / into blocks. 

Take as given an additive fitness function that assigns a value 
to any partition P 6 P* in the form 



M 



V(P) = ]T g(B m ) 



(4) 



where g(B m ) is the fitness of block B m . Computationally, 
the data cells must be represented by a data structure that 
contains sufficient statistics for the model - i.e. all information 
necessary to determine g for any block. 

We exhibit an efficient 0(N 2 ) dynamic programming algo- 
rithm that finds an optimal partition p max g p* : V(P max ) > 
V(P) for all partitions P 6 P*. 

Scargle [21] proposed two greedy iterative algorithms for 
finding near-optimal partitions: one top-down (optimally di- 
vide / into two parts; recursively do the same to each such 
part) the other bottom-up (merge adjacent data cells). In 
both cases Bayesian model comparison provides effective 
fitness functions and halting criteria, implementing an 0(N 2 ) 
procedure for data spaces of 1, 2 and higher dimensions — 
hence the term Bayesian Blocks [21]. But in practice these 
greedy algorithms often find significantly suboptimal parti- 
tions, motivating the development reported here. 

The dynamic programming idea for this kind of problem 
seems to have originated with Richard Bellman [6], a paper 
which influenced Kay's work [12], [13], [14]. An extensive 
discussion of precisely the same problem addressed here, but 
with a different approach to its solution, is in [3], [4]. Work by 
Hubert [10], [11], with applications to meteorology, influenced 
Kehagias and co-workers [8], [15], [16], [17], [18], [19], who 
developed a dynamic programming algorithm much like ours, 
for applications such as text segmentation (see also [9]), where 
the raw data are provided in the form of a similarity matrix. 
[22] gives an 0(kN 2 ) dynamic programming algorithm for 
finding the optimal partition of an interval into k blocks, for a 
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given k. See also [20], [2] for related work. Thus while similar 
algorithms have been previously developed, ours finds the 
exact global optimum for any block-additive fitness function, 
automatically determines the number of segments, and can be 
used in either real-time or retrospective analysis. 

II. Dynamic Programming: Finding Optimal 
Partitions 

We describe an 0(N 2 ) algorithm that is guaranteed to solve 
the above problem by finding an exact global optimum, for 
any fitness function V that is additive in the sense of Eq. 
(0}. There is a large (2 N ~ 1 ) but finite number of partitions 
in P*. Dynamic programming [5] is an intelligent method of 
searching this space of all possible solutions. Our algorithm 
can be applied whenever any subpartition of an optimal 
partition is optimal. 

Theorem 1 (Principle of Optimality): Let P max be an op- 
timal partition of / and Pi = {B m , m £ a} be any subset of 
the blocks of p max . Then Pi is an optimal partition of the 
part of / it covers, namely 1\ = [J B m . 

Intuitively this result follows from the contradiction that a 
better subpartition of I\ could be used to construct a partition 
of / better than p max . The proof relies on the fact that 
the block-additivity of the fitness function implies that it is 
also additive on subpartitions. To see this, divide partition 
P into any two disjoint parts, Pi = {B m ,m G a} and 
P 2 = {B mi m € b}, with Pi U P 2 = P and a [j b = M. 
Then the additivity of V yields 

M 

V(P) = 5>(I? m ) 

TCI — 1 

= V(P 1 ) + V(P 2 ). (5) 

Proof 1: As above, denote by P2 the subpartition of p max , 
consisting of the blocks {B m , m € M — a} in p max that are 
not in Pi. Let P3 be any other partition of I\. Since p max is 
an optimal partition of / and P3 U P2 is also a partition of / 
it follows that V(P max ) = V(Pi) + V(P 2 ) > V(P 3 UP 2 ) = 
V(P 3 ) + V(P 2 ) so V(Pi) > V(P 3 ). Thus Pi is an optimal 
partition of 1%, 

Dynamic programming is a recursive procedure that can 
be used to efficiently find the solution to many kinds of 
combinatorial optimization problems. Our algorithm derives 
the optimal partition of the first n + 1 data points using 
previously obtained optimal partitions, i.e. those of the first 
1, 2, ... 71 data points. At each iteration we must consider all 
possible starting locations j, 1 < j < n of the last block of 
the optimal partition. For each putative j the fitness function 
is - by the principle of optimality - the fitness of the optimal 
subpartition prior to j plus the fitness of the last block itself. 
The former was stored at previous iterations, and the latter is 
a simple evaluation of V. The desired new optimal partition 
corresponds to the maximum over all j. 

More precisely, define opt(n) to be the value of the fitness 
function of the optimal partition P™ ax of the first n cells of 



/, for 1 < n < N. The following dynamic programming 
algorithm finds the optimal partition P™ ax : 

1) Define opt(Q) = 

2) Given that opt(j) has been determined for j = 
0,1,..., n: 

« Define end(j,n + 1) = g(Bj, n+ i); Bj n+ i is the 

union of cells j, j + 1, . . . , n + 1 
• Then compute 

opt(n+l) =Max{opt(j-l) + end(j,n+l)}, (6) 

j 

for j = 1, 2, . . . , n + 1. 
« The value of j where this maximum occurs is stored 
as lastchange(n +1). 

3) Repeat 2 until n + 1 = N, when opt(N), the optimal 
partition fitness for all N cells, has been obtained. 

4) Backtrack using the lastchange vector to identify 
the start points of individual blocks of the optimal 
partition p max in the following way. Let n\ = 
lastchange(N), n 2 = lastchange(ni — 1), etc. Then the 
last block in p max contains cells nx,n\ + 1, . . . , N, 
the next-to-last block in p max contains cells n 2 ,U2 + 
1, . . . , n\ — 1, and so on. 

Theorem 2: This deterministic 0(N 2 ) dynamic program- 
ming algorithm finds the partition of / that maximizes the 
(additive) fitness function. 

Proof 2: The proof is by mathematical induction. Clearly 
opt(l) = Max{0 + end(l, 1)} = g(Bi A ) is the fitness of 
the only possible (and therefore optimal) partition of the set 
comprising the first cell. At iteration n + 1, assume not only 
that we have found the optimum partition of P™ ax , but also 
that for i = 1,2, ...,n, we have stored the corresponding 
fitness for this and all previous iterations in the array opt(i), 
and the index of the cell beginning this partition's last block 
in array lastchange (i). Let F(j) = opt(j — 1) + end(j,n+l)\ 
then the principle of optimality shows that when j indexes the 
first cell of the last block of the desired partition P™+ x , F(j) 
is the corresponding maximum fitness. Further, for any j, F(j) 
is the fitness of a legitimate partition of P™_f x , namely that 
consisting of the optimal partition of the cells prior to j 
followed by the single block Bj n+ i. These two facts combine 
to prove that the maximum of F(j) specified in Eq. (|6) gives 
the desired optimum partition at iteration n + 1. Identification 
of the corresponding optimal blocks - starting with the last 
one and working backwards, as in part (4) of the algorithm 
- can be validated with straightforward recursive application 
of the principle of optimality. Finally, since end(j, n + 1) = 
g(Bj, n+1 ) the algorithm requires 1 + 2 + ... + N = 0(N 2 ) 
evaluations of the function g. It also requires 0(N 2 ) additions 
and 0(N 2 ) comparisons in determining the maximums. 

III. Applications 

These results apply to any segmented modeling of 1- 
dimensional data defined by a fitness function that satisfies Eq. 
0. Piecewise constant, or step functions form the most natural 
model class. However the nature of the model depends on 
the application, and many other forms are possible, including 
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piecewise linear and piecewise exponential. The key is that 
the fitness function must not depend on any model parameters 
other than the changepoint locations. We have found excellent 
results with the posterior probability of the model for each 
segment, given the data in that segment, marginalized over 
all parameters but these locations [21]. Using logs turns the 
product posterior (resulting from statistical independence of 
the blocks) into the required additive form Eq.@. 

A comment about smoothness constraints is in order. With 
some fitness functions the algorithm produces the degener- 
ate solution assigning a segment to each data point. In the 
Bayesian setting described above, a natural way to address 
this problem is to adopt a prior distribution for the number 
of segments, for example giving higher weight to smaller 
numbers. Indeed, the geometric prior [7] corresponds to a 
constant term in the fitness function for each block, so there is 
no problem with maintaining additivity. This artifice controls 
model complexity, but without imposing an explicit smooth- 
ness condition with concomitant loss of time resolution. Time 
series features on any time scale, no matter how short, can 
be found if the data support them in a statistically significant 
way. 

Finally, we mention a few sample applications. Implement- 
ing density estimation with piecewise constant Poisson models 
yields histograms in which the bins are not constrained to be 
equal. The number of bins and their sizes and locations are 
determined by the data. The same model provides denoising 
and structure estimation for time series of events or counts of 
events in bins [21]. 

Further, almost all of the results described here can be 
easily extended - almost without change - to data of higher 
dimensionality, as will be described in future papers. In this 
setting cluster analysis can be effected as a post-processing of 
segmented models - piecing the blocks together into clusters 
- and similarly with unsupervised classification and other data 
mining procedures. 

IV. Conclusion 

As we have seen dynamic programming gives a good 
(polynomial) algorithm for finding an optimal partition of 
data on an interval for any fitness function V satisfying 
the additive property [see Eq.@]. Ironically it has the same 
0(N 2 ) complexity as the greedy algorithm. 

In comparing the use of our algorithm to detect and charac- 
terize clusters (collections of blocks) with some of the standard 
clustering techniques [1], we note that our method inherently 
compares partitions that have different numbers of blocks, 
so the number of blocks is automatically determined by the 
data. This is to be contrasted with most standard clustering 
techniques, in which k, the fixed number of clusters must 
be specified ahead of time. One often seeks to minimize 
the maximum diameter (defined as the maximum distance 
between any pair of points in the cluster) of the clusters, or 
to maximize the minimum separation between the clusters. In 
dimension 1, there are well-known 0(kN 2 ) dynamic program- 
ming algorithms for finding the best partitions into k clusters. 
For dimension 2 and higher it is known that these standard 



problems are NP-complete. We don't yet know if our problem 
is NP-complete in dimension 2 and higher. 

In addition, considered as a density estimation or signal 
detection technique, our approach does not introduce any 
explicit smoothing of the data. Structure on any time scale, no 
matter how short, will be detected if it is supported by the data. 
While the parameter in the geometric prior discussed above 
controls to some extent the number of blocks - and thus affects 
the roughness of the optimized model - it is not explicitly a 
smoothing parameter. Another feature is that the incremental 
way the algorithm operates on the data makes a real-time mode 
trivial to implement. This mode has found to be very useful 
in the rapid detection of change points in a data stream. And 
since opt(i + l) is calculated from opt(j),j = 1, 2, . . . , i, some 
of the necessary calculations can be performed as the data are 
still being collected. Also it is easy to modify the dynamic 
programming to yield the optimal partition with blocks of 
a minimum size (each block contains at least d data points, 
for a given positive integer d). These and other features are 
described in more detail at an algorithm repository at: 

http : //astrophysics . arc . nas a . gov/ ~pgaz is/ CodeAr chive Server . html 

We are grateful to Steve Kay, Thanasis Kehagias, and the 
referees for helpful comments and pointers to relevant earlier 
work. 
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